1 Overview & Motivation

The existence of forests is essential for our life on Earth. By covering around 31 percent of the world’s total land area, forests provide a retreat and home to over 80 percent of land animals and countless partially even undiscovered plants. One can say that forests are the backbone of entire ecosystems. A significant part of the oxygen we breathe is provided by the trees, while they also absorb about 25 percent of greenhouse gases. Also economically we are dependent on forests as the livelihoods of about 1.6 billion people around the world are directly or indirectly connected to forests. Furthermore, forests provide 40 percent of today’s global renewable energy supply, as much as solar, hydroelectric and wind power combined. Despite these utilities, forestation across the world has faced several challenges ranging from wildfires, human-driven deforestation, poor management and poor conversation in general. However, a loss of whole forests would mean severe consequences to humanity and life on Earth.

With this project we seek to answer important questions that address these challenges. We want to figure out the causes of destruction of forests, highlight their importance to our environment and predict trends around reforestation/deforestation. Moreover, we hope to show how we can tackle climate change by reforestation, in particular, how an increase in forest area will help to increase the buffer of sustainability. For the statistics so far, see our reference (UN 2019).

2 Initial Questions

To begin with we want to give a general overview of global forest development over the last 30 years. Afterwards we want to dig deeper into the topics of deforestation and reforestation by extracting the main responsible countries, showing trends and investigating in possible correlations, which is leading to a crucial prediction in how many years forests would be lost, if humankind continues to act as it has in the past. In the next chapter, our analysis leads us into the area of forest destruction by natural causes. After a comprehensive overview and focusing on countries most affected by forest destruction, we want to examine whether there is a correlation between rising temperatures and wildfires and additionally try to predict where and when wildfires are likely to occur. In our final chapter we put our forest data in relation to other environmental issues as air pollution, water availability, greenhouse gas emissions and the carbon storage of forests resulting in a prediction of how much forest area has to be further increased to tackle all greenhouse gas emissions.

During the analysis, we considered some of our rather general questions from the proposal from different angles in order to provide answers that are more specific about different regions of the world. Furthermore we also tried to give some overview over other environmental issues related to our questions, for example which countries have the highest air pollution or emission values.

4 Datasets & Preprocessing

In the following we will give an overview over the different datasets we will use to answer our questions and also show some of the preprocessing steps we performed before the analysis.

4.1 Global Forest Resources Assessment

The data of the Global Forest Resources assessment contains data on forest development for the intervals between 1990 - 2020, which we mainly use to answer questions around reforestation and deforestation, as well as data on forest disturbances for the period 2000-2017 and data on total forest areas. (FAO 2020a).

Data on forest development

Since the data on forest development contains only about 50% (see below) of the possible values, we had to decide how to deal with this issue. From looking at the data and the respective report, we assumed that those values are missing not at random (MNAR). Therefore we decided to not impute the missing values, to make sure that our results are reliable in the sense that we do not make any overestimations. However, this also means for this topic, that we will have underestimations and sometimes even completely missing values for some countries.

Preprocessing

  • We summed up the average deforestation and reforestation values from the given intervals to get the total values for each interval and for the whole time period.
  • Furthermore we had to change some country names to the short name version, to match the country names to the world map data.
  • We also summed up the reforestation and afforestation values to express any human-made expansion of forest area.
Data on forest disturbances

The forest disturbance data contains just about 25% (see below) of the possible values. We again assumed that those values are missing not at random (MNAR) and therefore decided to not impute those values. So for the topic of forest disturbances we will have the same limitations as previously mentioned.

Preprocessing First we remove everything that we won’t need and make column names (of columns that we will later need) more readable:

  • The ‘regions’ column gets exchanged by a continent column which provides more intuitive information.
  • The ‘iso3’ column doesn’t provide any further information.
  • The name and continent column are changed to factor variables.
  • Replace missing values with 0.

For the wildfire correlation, we also added a further variation of this dataset where we removed each country with missing wildfire values for any given year.

Datasets on further topics (forest area / carbon stock)

Forest Area

This dataset contains only forest area for a few intervals.

Carbon Stock

This dataset captures the carbon stock per country for selected 9 years (1990, 2000, 2010, 2015, 2016, 2017, 2018, 2019, 2020).

Preprocessing

  • Some columns were converted to a consistent data type.
  • Missing values were handled with the mice library: Years not captured in the dataset were replaced with the mean carbon stock per country.

4.2 FAOSTAT domain forests

The next datasets are also from the FAO (FAO 2021c) and contain data on forest area changes for countries and continents from 1990-2020 including every year.

Forest area of countries
Forest area of continents


4.3 FAOSTAT domain temperature change

To work on wildfire predictions due to temperature rise we need data on mean surface temperature change as well. In this case for the period 1961-2020. The temperature increase (“Value”) given in this dataset is in relation to a baseline temperature which corresponds to the period of 1951–1980 (FAO 2021b).

Preprocessing

Again we remove everything that we won’t need:

  • All ‘Code’-columns: Add no information.
  • All years that are not in 2000-2017: No forest data for these years.
  • The Unit column: it’s always °C.
  • All missing values (Flags != “Fc”) and afterwards the flag column.

4.4 OECD air quality and health

Our fourth dataset was created by the OECD and holds data on mean population exposures to outdoor and ambient PM2.5 particles for the period 1990-2019 (not every year included) (OECD 2021).

The data contains the air pollution values in \(\mu g/m^3\) from the year 1990 to 2019. However, a few years are not included in the data.

4.5 Climate Watch GHG Emissions

The CAIT dataset contains data on greenhouse gas (GHG) emissions for the period from 1990-2018 (Climate Watch 2020).

Preprocessing

  • We used only the “Total including LUCF” sector.
  • Some columns were renamed and irrelevant columns were dropped.
  • The structure of the dataset was changed to have a year and emission column.

4.6 FAO AQUASTAT database

Our last dataset is the AQUASTAT database from the FAO which shows data of annual averages on precipitation and renewable water resources between 1961-1990 (FAO 2021a).


5 Global Forest Development

Our first chapter of the project gives an overview over the global forest development in the last 30 years.

5.2 Countries with the largest forest area

Before we continue with our deeper analysis, we also want to highlight which countries hold the most forest area on our Earth.

The results are not really surprising as those countries have also the highest surface worldwide. However, if you take for example Russia’s forest area into relation with the forest area of the rest of the world, outside the top ten countries, it shows quite impressive results. Besides that the top 10 countries have in sum more forest area than the rest of the world together.

6 Deforestation

In our analysis we are now attempting to answer important questions regarding deforestation, which is actually a part of forest destruction. However we want to highlight this issue in an own chapter as it is directly made by humans. Therefore we will have a look on the main drivers (by countries) of deforestation, showing trends over the continents and making a prediction of how many years it would take until all forests are lost by putting the deforestation and reforestation values over the last 30 years into relation.

6.1 Main drivers of deforestation and forest prediction

By hovering over the following map you can find the data for every country regarding the percentage of forest lost through deforestation in the last 30 years in relation to 1990, the value for deforestation in the last 30 years, the current forest area [1000 ha] and the number of years in which the forest area will be completely lost, given the deforestation for each country in the last 30 years.

The color scale is showing the percentage of forest lost, which is making the main drivers of deforestation visible.

Note: for some countries the deforestation data is 0 or not available. If possible, for those countries the deforestation value was calculated by the difference of the forest area between 1990 and 2020. Otherwise they are colored white.

The two following bar plots are intended to illustrate the result once again. First, the top 20 countries that do not seem to have a problem with the current level of deforestation, given there forest area and then the top 20 countries that could lose their forest area due to too much deforestation within the next 100 years.

## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#bar
## No trace type specified:
##   Based on info supplied, a 'bar' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#bar
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Countries that are actually located in the subtropical zone with a lot of rainforest, such as Nicaragua, Indonesia, Paraguay, Uganda or Ivory Coast, should be highlighted here. Since we were also wondering why Brazil is missing in this list (given recent reports), we noticed later in our analysis, that the area of burned forests are extraordinary high. Assuming that the burned forest area belongs to the cause of deforestation and not wildfires, Brazil would loose it’s forests far earlier than in 133 years.

7 Reforestation

In this part we want to dig in deeper into the topic of the reforestation by analyzing which countries are the main drivers, whether there is a correlation between reforestation and deforestation and showing the trends over continents.

7.1 Main drivers of reforestation

As these results show mainly countries with a huge surface, we want to put the increase of reforestation from 1990-2020 in relation to the forest area in 1990.

This leads us to quite surprising results, e.g. that Algeria is one of the countries with the highest reforestation increase given the total forest area in 1990. Nevertheless this doesn’t come out of nowhere, along with other North African countries, Algeria is pursuing several reforestation projects such as the great green wall or “barrage vert” (Göbel 2021). This results in the fact, that Algeria is one of the countries which has a higher forest cover in 2020 than in 1990.

By hovering over the following map, a tooltip of the reforestation increase for each country is shown. The greener the country, the higher is the increase.

7.2 Relation between reforestation and deforestation

After we have now an overview on reforestation and deforestation we want to answer the question whether governments try to “make up” for the deforestation in the last 30 years.

First of all we want to show the relation of total reforestation and deforestation in the last 30 years, to get a first impression.

There are many outliers with either very high deforestation or reforestation figures. However, such outliers are not surprising when taking a look into recent news.For example Brazil, a country with one of the biggest rainforest areas, has been making negative headlines for years with it’s environment politics (ONLINE 2021).

We now zoom in to have a closer look on the data without the outliers by downsizing the scale.

The figure shows already a high and not linear distribution of our data.

With the Shapiro-Wilk test we want to show the normality of our data.

## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totalref
## W = 0.21089, p-value < 0.00000000000000022
## 
##  Shapiro-Wilk normality test
## 
## data:  corref$totaldef
## W = 0.16293, p-value < 0.00000000000000022

The values are below 0.05 for both, reforestation and deforestation, the data significantly deviate from a normal distribution. A result which was already highlighted by the graph.

As the data is therefore not linear, we should choose the Spearman method to calculate the correlation.

With a value of 0.54 it shows a strong positive correlation, which means, that deforestation has actually an impact on reforestation and a relationship exists.

Additionally, we calculated the power predictive score to calculate the impact of deforestation on reforestation. However this value is with 0.13 not as high as expected after the correlation result.

8 Forest Destruction

The next part of our project is about forest destruction and tries to answer what the main causes of forest destruction are and which countries are affected the most. We also take a look at the relationship between wildfires and rising temperatures and try to predict where and when wildfires are likely to occur.

For the first two questions we decided to look at a global scale and for the country Germany.
For the third question we decided to look at a global scale and for the continent Europe.

Note: In our interactive shiny website you can pick the country / continent of your interest.

8.1 Main causes of forest destruction

On a global scale wildfires are clearly the dominant cause of forest destruction over the years. However, this does not apply for every individual country. For example: Germany’s main cause of forest destruction are insects. One can also see that there is no obvious trend to find over this small time scale.

Note: the peak in the plot about Germany was caused by the heat wave in 2003.

Destroyed forest by cause

Again we can see the huge impact of wildfires. On a global scale they make up more than 50% of forest destruction, destroying more than 1 Billion ha of forest. Insects make up almost 25% of forest destruction, destroying roughly 500 Million ha of forest. These two are clearly the main drivers of global forest destruction.

In Germany insects and diseases are the main drivers of forest destruction, destroying approximately 2.5 Million ha of forest over this 18 years time period. However, wildfires are no significant problem at all in Germany.

8.2 Most affected countries

As one would expect from our previous findings, the most affected countries mostly struggle with wildfires. The only exceptions to this are the USA, Canada, China, Sudan and Mexico. Brazil is the most affected country and has a huge wildfire problem. However, since Brazil is a tropical region and therefore very humid these wildfires are most likely caused by humans (ONLINE 2021). For Europe one can see that except for Russia, Europe’s most affected countries have no problem with wildfires.

8.3 Relation between rising temperatures and wildfires

Overview:

Our first visualization doesn’t suggest any linear correlation, but it can be improved to make the interpretation clearer. Again, the temperature increase given in this dataset is in relation to a baseline temperature which corresponds to the period of 1951–1980.

Next we compare global yearly temperature changes to global forest area destructed by wildfires.

There is no linear correlation visible for both land and forest fires and rising temperatures.

Now we compare global yearly temperature changes to the global count of wildfires.

Again there is no linear correlation visible for both land and forest fires and rising temperatures. Maybe we can find a correlation if go more into detail and show not only global values but values for each country and each year:

## Warning: Removed 12 rows containing missing values (geom_point).

## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 14 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 14 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing missing values (geom_point).

## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).

## Warning: Removed 24 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).

## Warning: Removed 20 rows containing missing values (geom_point).
## Warning: Removed 16 rows containing missing values (geom_point).
## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).

## Warning: Removed 22 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).

## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 19 rows containing missing values (geom_point).

## Warning: Removed 19 rows containing missing values (geom_point).

## Warning: Removed 19 rows containing missing values (geom_point).

## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 17 rows containing missing values (geom_point).
## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).

## Warning: Removed 23 rows containing missing values (geom_point).
## Warning: Removed 18 rows containing missing values (geom_point).
## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).

## Warning: Removed 26 rows containing missing values (geom_point).
## Warning: Removed 12 rows containing missing values (geom_point).

## [1] "Kendall =  0.464052287581699"

We need a way to quantify these results. Since the data is clearly not linear we use the Predicitive Power Score [..] as a measure for correlation.

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 9 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 9 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Wildfires-Temperature relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 3.6 observations in each test-set for the Temperature-Wildfires relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Note: name was forced from character to factor.
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Note: name was forced from character to factor.
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes

## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in diag(cm)/rowSums(cm): Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in 2 * precision * recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## Warning in precision + recall: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes

There is clearly no correlation recognizable in our data between temperature increase and wildfires.

8.4 Prediction of where and when wildfires are likely to occur

First we look at the available data and see which columns might help us to make a prediction.

Based on the columns in the table above, we will try to predict if a wildfire occurs or not. So we will use the location, year, forest composition, destruction information and temperature increase.

We got roughly 3800 samples for the prediction

## x Fold2: preprocessor 1/1, model 1/1 (predictions): Error in model.frame.default(...

Our most accurate prediction depends only on the given country and is therefore not very useful in finding the underlying source of wildfires, even though it has a accuracy 92%. If we remove the country as a predictor, the main factors for deciding whether a wildfire occurs or not are the type of forests in a given country and the the amount of destroyed forest by insects. Both of these values don’t need to be very high to result in a 70% chance of a wildfire in a given country.

9 Relation to other environmental issues

In this section, we would like to address some of the aspects that are relevant to climate change, such as air pollution and green house gases (GHG), and provide an outlook on how the development of forest area can have an impact on these topics. In particular we will provide an overview of GHG and the carbon stock in forests all over the world, to calculate then the correlation of those two values. Moreover we will then investigate in how much GHG will be absorbed by an increasing forest area. Also for the air pollution we want to figure out its relationship with the forest area. Finally we will predict, how much forest area will be needed to tackle current CO2 emissions by country.

9.1 Relation between forest area and air pollution

We start with with an overview over the air pollution, the plot below shows the top 10 polluters with the global average. The list is relatively evenly distributed between countries with large and small forest areas, in relation to their total area.

## Selecting by avgAirPollution

India has the highest air pollution in the last 30 years, that’s why we take a closer look on India’s air pollution figures compared to the global figures.

Air pollution trend in India Findings: * Air Pollution drastically increased after the year 2005. * It was highest in 2012. * It decreased in 2005 by ~10 Micrograms per cubic metre. * Sudden decrease around 2017. * Increased slightly in 2019 again

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

We notice that India’s Air pollution range lies between 65.3201 \(\mu g/m^3\) (least) and ~95.8366 \(\mu g/m^3\) (highest), while the range of average air pollution for other countries lies between 5.1633 \(\mu g/m^3\) and 66.9532 \(\mu g/m^3\). However, a lot of outliers in the global average can be found, showing that a lot of countries have air pollution ranging above the maximum average value.

Furthermore we try to find a relationship between the forest area and air pollution. The assumption is confirmed by the correlation value that there is no relationship here.

## [1] "Kendall =  -0.032967032967033"

The results above indicate that there is no correlation between Forest Area and Air Pollution.

For a further verification of the relationship between Forest Area and Air Pollution, we also use the Predictive Power Score.

## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the ForestArea-AirPollution relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.
## Warning in score(df, x = param_grid[["x"]][i], y = param_grid[["y"]][i], : There are on average only 2.8 observations in each test-set for the AirPollution-ForestArea relationship.
## Model performance will be highly instable. Fewer cv_folds are advised.

The Predictive Power Score of 0 also shows no relationship between forest area and air pollution.

9.2 Relation between available water / precipitation and forest area

Finally we want to take a look at two relationships between forest area and water. We decided to go for a global instead of a country-wise analysis to keep this chapter consistent. First we want to find out if there is a correlation between long-term average annual precipitation and the average forest area between 1990 - 2020.

Looking at the plot above, there seems to be a clear relationship between rainfall and forest area. To quantify these impressions, we again calculate a correlation score. In this case, both variables are not normally distributed, which we can see from the very low p-values of the Shapiro-Wilk tests. Therefore we go for the Spearman correlation coefficient.

## 
##  Shapiro-Wilk normality test
## 
## data:  water_forest_data$`Long-term average annual precipitation in volume`
## W = 0.37658, p-value < 0.00000000000000022
## 
##  Shapiro-Wilk normality test
## 
## data:  water_forest_data$ForestArea
## W = 0.2588, p-value < 0.00000000000000022
## [1] "Spearman =  0.930861096154095"

The results fit to the first impression: there is a very strong positive correlation (0.9308611) between long-term average annual precipitation and long-term average forest area.

Next we want to look at the relationship to the total renewable water resources a country has available per year.

Again there seems to be a strong relationship. Both variables are not normally distributed (see below), so we go for the Spearman correlation coefficient once more.

## 
##  Shapiro-Wilk normality test
## 
## data:  water_forest_data$`Total renewable water resources`
## W = 0.34162, p-value < 0.00000000000000022
## [1] "Spearman =  0.880769586303481"

The results show that there is also a very strong positive correlation (0.8807696) between the total renewable water resources a country has available per year and the long-term average forest area of a country.

Although these results are not really surprising, they give a clear impression of how dependent our forests are on the amount of available water and rainfall. Since with climate change the amount of extreme weather phenomena increases, this could lead to further troubles for the global forest land.

9.3 Relation between greenhouse gas emissions and carbon stored in forests

Analysis and Visualizations of GHG Emissions

To start with we visualized the top 20 emitters of GHG globally, within the time period 1990-2018. The result below shows China at the top, a country which (as we have already shown) has one of the highest forest area increases between 1990-2020.

As seen below, we sought to know the proportionate constituents of the green house gases:

Furthermore we calculated the average emission per country from 1990 - 2020 which is visualized in the map below.

Analysis and visualization of carbon stock

The next important variable is the carbon stock hold globally, which is here visualized in a trend line for the period 1990-2018. To put it already into relation, we show the visualization for the GHG trend right under it. One can already notice that the carbon stock has especially in the 1990s a sharp decrease, while the GHG trend is increasing strongly since 1999.

Before we will calculate the correlation we want first to have a look on the countries with the largest carbon stock. Not surprisingly we find also here countries with the highest number of forest area.

Correlation

We seek to answer the correlation question here, we start by comparing both variables per year.

Correlation result

## [1] "Kendall =  -0.954415954415954"

We decided to use the Kendall method to calculate the correlation, because the data is small and non-normally distributed. We find a very strong negative correlation with -0.95. This results in the interpretation, that a decreasing carbon stock leads to increasing GHG emissions. However, this relationship can not be ascertained using correlation because correlation does not necessarily imply causation.

Relation between absorbed gas and forest area increase

That’s why we want to put the GHG emission in to relation with the forest area. The objective of our analysis is to investigate how much GHG will be absorbed if the forest area is increased. To do so, we use the carbon stock and forest area datasets already worked on, which we visualize below.

Using linear regression we come to the conclusion, that for a 1000 ha increase in total forest area, the volume of carbon absorbed will increase by 0.0024 MtCO2, given that all other variables being constant. Or differently expressed it would lead to a decrease of CO2 by 0.0024 Mt.

9.4 Required forest area for current CO2 emissions

The map below highlights the countries which have to increase the forest area the most to tackle their current CO2 emissions. As our dataset only provides C02 emission data from 1990-2018, we used the Arima model time series prediction to predict future CO2 emissions. By taking into account that one acre of forest can absorb about 2.5 tons of carbon annually, the predicted CO2 value is used to find the required amount of additional forest area.

10 Final Analysis

Final Analysis The bottom line on what we’ve learned from our data is, that forests change incredibly fast. Even in our given time range from the last 30 years we gained many fascinating findings. In general we answered the questions with exploratory data analyses, regression models and correlation. To justify our results, we tried to have a view on our data from at least two sides. In these cases, we were thus able to reinforce a previous result with a further calculation or even visualization (e.g. continental and country-wise results).

Starting with the forest area trends we detected that there are huge differences between countries and even continents when it comes to the development of forest area. While the Americas (in particular South America) and Africa faced a decrease Asian countries could slightly increase the forest area. Main reason for that was the extensive deforestation in Brazil with 19% while China recorded a way higher number in forest area 2020 than in 1990. The forest trend in Europe in the last 30 years is stable, it is even slightly upwards. A new perspective on the world is providing our map with the deforestation analysis. It shows the loss of forests in the past 30 years and furthermore the years until the forests are completely lost, if a country keeps on deforesting with the same pace. However, the numbers should be seen with the understanding that the amount deforestation is declining on almost every continent. For example countries in Americas deforested between 2000-2010 around 7 billion ha, the figure declined between 2010-2020 to 3,1 billion ha. When trying to put the deforestation results into relation with the reforestation we found a moderate correlation, showing that deforestation has actually an impact on reforestation. One could argue that at least some countries want to „make up“ for their human-driven deforestation. Interesting would be also to put the reforestation numbers into relation to other forest destruction causes, to show which countries are willing to tackle climate change. When it comes to natural forest destruction we figured out that wildfires are with 50% share clearly the dominant cause, however it varies between countries, e.g. in Germany is the main cause of forest destruction insects, in Brazil fires. We have to mention here, that our dataset does not distinguish between natural caused and human-driven fires. Based on recent articles and that Brazil is located in a tropical zone, we can assume that it is a human-driven deforestation. That is also one reason why our prediction of when and where wildfires are likely to occur is quite tough to answer. A more precise dataset, distinguishing between the causes of fire is missing. Finally we investigated in another large subject area, forests and the relation to other environmental issues. We were able to calculate the decrease in CO2 (0.0025 MtCO2) if forest area is increased by 1000 ha and furthermore we were able to determine how much forest area each country would have to plant, to be able to absorb their C02 emissions.

Since articles and papers deal very specifically with the individual topics around forests, we wanted to give an overview or a summary with our own data and calculations about the most important issues from the beginning of our project. We are aware that we could not cover everything and would have liked to answer many more questions that came to our mind before or during the project. However, we noticed that this would have been beyond the scope of our project. A look into the future is also intriguing. Besides to observe whether current positive or negative trends are holding on, with more years of collecting data, it would open completely new possibilities to answer further questions and make even more precise predictions.

Ressources

Climate Watch, CAIT data: 2020. “GHG Emissions.” Washington, DC: World Resources Institute. 2020. https://www.climatewatchdata.org/ghg-emissions.
FAO. 2020a. “Global Forest Resources Assessment 2020.” 2020. https://fra-data.fao.org/WO/fra2020/home/.
———. 2020b. “Global Forest Resources Assessment 2020: Main Report.” Rome: FAO. https://doi.org/10.4060/ca9825en.
———. 2021a. “AQUASTAT Core Database.” FAO. 2021. http://www.fao.org/aquastat/statistics/query/index.html?lang=en.
———. 2021b. “FAOSTAT Temperature Change Dataset.” Rome Italy: FAO. 2021. http://www.fao.org/faostat/en/#data/ET.
———. 2021c. “FAOSTATC Climate Change,emissions, Fires.” Rome Italy: FAO. 2021. http://www.fao.org/faostat/en/#data/GF.
Field, Christopher B, and Vicente R Barros. 2014. Climate Change 2014–Impacts, Adaptation and Vulnerability: Regional Aspects. Cambridge University Press.
Göbel, Alexander. 2021. “Ein Grüner gürtel Gegen Die Sandige wüste.” Tagesschau.de. https://web.archive.org/web/20100222172301/http://www.tagesschau.de:80/ausland/sahelzone100.html.
Messier, Christian, Klaus J Puettmann, and K David Coates. 2013. Managing Forests as Complex Adaptive Systems: Building Resilience to the Challenge of Global Change. Routledge.
OECD. 2021. “Air Quality and Health: Exposure to Pm2.5 Fine Particles - Countries and Regions.” https://doi.org/10.1787/96171c76-en.
ONLINE, ZEIT. 2021. “Abholzung Im Amazonasgebiet Steigt Auf Neuen höchststand.” ZEIT ONLINE. https://www.zeit.de/wissen/umwelt/2021-06/brasilien-abholzung-amazonasgebiet-regenwald-mai-hoechststand.
Ritchie, Hannah, and Max Roser. 2021. “Forests and Deforestation.” Our World in Data.
UN, Economic Commission for Europe. 2019. “10 Facts to Fall in Love with Forests.” Unece.org. https://unece.org/forestry/news/10-facts-fall-love-forests#:~:text=1.